Skip to content

Implement HostVector for more robust CPU <-> GPU data transfers#93

Merged
jipolanco merged 12 commits intomasterfrom
host-array
Feb 3, 2026
Merged

Implement HostVector for more robust CPU <-> GPU data transfers#93
jipolanco merged 12 commits intomasterfrom
host-array

Conversation

@jipolanco
Copy link
Owner

@jipolanco jipolanco commented Feb 2, 2026

This is used as a staging area in CPU memory during host-device transfers, which is pagelocked on CUDA and AMDGPU for faster transfers. We also avoid as much as possible reallocating this CPU memory (typically when the number of filament points changes), which would require redoing the pagelock. This seemed to cause crashes in AMDGPU in particular which seem to be fixed now. Moreover, the multi-GPU behaviour is more robust now (in CUDA in particular, where there were some issues).

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Benchmark Results (Julia v1)

Time benchmarks
master 80bea42... master / 80bea42...
BiotSavart/add_local_integrals! 13.5 ± 0.99 ms 13.4 ± 0.86 ms 1.01 ± 0.099
BiotSavart/add_point_charges! 12.8 ± 0.44 ms 12.8 ± 0.27 ms 1 ± 0.04
BiotSavart/velocity 0.593 ± 0.008 s 0.611 ± 0.01 s 0.97 ± 0.021
BiotSavart/velocity + streamfunction 0.721 ± 0.011 s 0.729 ± 0.0037 s 0.988 ± 0.015
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted) 0.0824 ± 0.0037 s 0.0816 ± 0.0032 s 1.01 ± 0.061
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted) 0.106 ± 0.011 s 0.0975 ± 0.0066 s 1.09 ± 0.13
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted) 0.0602 ± 0.0021 s 0.0599 ± 0.0022 s 1.01 ± 0.051
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted) 0.0834 ± 0.024 s 0.0766 ± 0.0064 s 1.09 ± 0.33
CellLists/CPU/nsubdiv = 1/foreach_source 0.0745 ± 0.0077 s 0.0712 ± 0.0046 s 1.05 ± 0.13
CellLists/CPU/nsubdiv = 1/iterator_interface 0.0936 ± 0.0062 s 0.122 ± 0.024 s 0.77 ± 0.16
CellLists/CPU/nsubdiv = 1/set_elements! 1.98 ± 0.082 ms 1.99 ± 0.081 ms 0.996 ± 0.058
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted) 0.112 ± 0.0066 s 0.107 ± 0.0024 s 1.05 ± 0.065
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted) 0.276 ± 0.029 s 0.287 ± 0.012 s 0.963 ± 0.11
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted) 0.0993 ± 0.0093 s 0.105 ± 0.0045 s 0.948 ± 0.098
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted) 0.276 ± 0.025 s 0.259 ± 0.0078 s 1.06 ± 0.1
CellLists/CPU/nsubdiv = 2/foreach_source 0.283 ± 0.0097 s 0.273 ± 0.024 s 1.04 ± 0.097
CellLists/CPU/nsubdiv = 2/iterator_interface 0.34 ± 0.0081 s 0.378 ± 0.014 s 0.898 ± 0.04
CellLists/CPU/nsubdiv = 2/set_elements! 5.88 ± 0.73 ms 7.07 ± 0.68 ms 0.831 ± 0.13
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted) 0.0648 ± 0.0015 s 0.0672 ± 0.0032 s 0.965 ± 0.051
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted) 0.0666 ± 0.0037 s 0.0669 ± 0.0014 s 0.996 ± 0.059
CellLists/OpenCLBackend/nsubdiv = 1/set_elements! 2.32 ± 0.06 ms 2.33 ± 0.061 ms 1 ± 0.037
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted) 0.183 ± 0.0018 s 0.185 ± 0.0035 s 0.991 ± 0.021
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted) 0.215 ± 0.0079 s 0.212 ± 0.0044 s 1.02 ± 0.043
CellLists/OpenCLBackend/nsubdiv = 2/set_elements! 7.51 ± 0.54 ms 7.49 ± 0.37 ms 1 ± 0.087
Diagnostics/energy_flux 1.15 ± 0.0023 s 1.14 ± 0.009 s 1 ± 0.0082
Diagnostics/energy_injection_rate 8.65 ± 0.15 ms 8.65 ± 0.097 ms 1 ± 0.02
Diagnostics/energy_spectrum 2.26 ± 0.095 ms 2.26 ± 0.11 ms 1 ± 0.064
Diagnostics/energy_transfer_matrix 1.56 ± 0.1 s 1.44 ± 0.073 s 1.09 ± 0.091
Diagnostics/helicity 5.51 ± 0.087 ms 5.5 ± 0.041 ms 1 ± 0.018
Diagnostics/kinetic_energy 0.875 ± 0.015 ms 0.879 ± 0.014 ms 0.996 ± 0.023
Reconnections/ReconnectBasedOnDistance 3.26 ± 0.074 s 3.27 ± 0.087 s 0.997 ± 0.035
Reconnections/ReconnectFast 0.266 ± 0.019 s 0.26 ± 0.011 s 1.02 ± 0.085
Refinement/RefineBasedOnSegmentLength 7.21 ± 2.3 ms 6.91 ± 1.8 ms 1.04 ± 0.44
Timestepping/forcing 28.2 ± 3.9 ms 22.3 ± 3.1 ms 1.27 ± 0.25
Timestepping/step! 2.8 ± 0.072 s 2.8 ± 0.06 s 1 ± 0.033
time_to_load 1.63 ± 0.00093 s 1.64 ± 0.0054 s 0.995 ± 0.0033
Memory benchmarks
master 80bea42... master / 80bea42...
BiotSavart/add_local_integrals! 7.04 k allocs: 4.17 MB 7.04 k allocs: 4.16 MB 1
BiotSavart/add_point_charges! 6.04 k allocs: 0.693 MB 6.04 k allocs: 0.677 MB 1.02
BiotSavart/velocity 15.6 k allocs: 4.97 MB 14.1 k allocs: 4.88 MB 1.02
BiotSavart/velocity + streamfunction 16 k allocs: 5.07 MB 14.6 k allocs: 4.98 MB 1.02
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (SIMD/unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 1/foreach_pair (unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 1/foreach_source 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 1/iterator_interface 22 allocs: 2.16 kB 1.2 M allocs: 0.0834 GB 2.46e-05
CellLists/CPU/nsubdiv = 1/set_elements! 0.044 k allocs: 3.69 kB 0.044 k allocs: 3.69 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (SIMD/unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (sorted) 0.066 k allocs: 5.97 kB 0.066 k allocs: 5.97 kB 1
CellLists/CPU/nsubdiv = 2/foreach_pair (unsorted) 22 allocs: 2.22 kB 22 allocs: 2.22 kB 1
CellLists/CPU/nsubdiv = 2/foreach_source 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 2/iterator_interface 22 allocs: 2.16 kB 22 allocs: 2.16 kB 1
CellLists/CPU/nsubdiv = 2/set_elements! 0.044 k allocs: 3.69 kB 0.044 k allocs: 3.69 kB 1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (sorted) 0.641 k allocs: 0.0398 MB 0.641 k allocs: 0.0398 MB 1
CellLists/OpenCLBackend/nsubdiv = 1/foreach_pair (unsorted) 0.122 k allocs: 9.11 kB 0.122 k allocs: 9.11 kB 1
CellLists/OpenCLBackend/nsubdiv = 1/set_elements! 0.731 k allocs: 0.0433 MB 0.731 k allocs: 0.0433 MB 1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (sorted) 0.641 k allocs: 0.0398 MB 0.641 k allocs: 0.0398 MB 1
CellLists/OpenCLBackend/nsubdiv = 2/foreach_pair (unsorted) 0.122 k allocs: 9.11 kB 0.122 k allocs: 9.11 kB 1
CellLists/OpenCLBackend/nsubdiv = 2/set_elements! 0.731 k allocs: 0.0433 MB 0.731 k allocs: 0.0433 MB 1
Diagnostics/energy_flux 0.0556 M allocs: 8.59 MB 0.0556 M allocs: 8.59 MB 1
Diagnostics/energy_injection_rate 2.06 k allocs: 0.203 MB 2.06 k allocs: 0.203 MB 1
Diagnostics/energy_spectrum 26 allocs: 3.22 kB 26 allocs: 3.22 kB 1
Diagnostics/energy_transfer_matrix 25 k allocs: 0.0392 GB 25 k allocs: 0.0392 GB 1
Diagnostics/helicity 1.71 k allocs: 0.0703 MB 1.71 k allocs: 0.0703 MB 1
Diagnostics/kinetic_energy 2.05 k allocs: 0.0805 MB 2.05 k allocs: 0.0805 MB 1
Reconnections/ReconnectBasedOnDistance 4.18 M allocs: 0.312 GB 4.18 M allocs: 0.312 GB 1
Reconnections/ReconnectFast 7.49 M allocs: 0.196 GB 7.49 M allocs: 0.196 GB 1
Refinement/RefineBasedOnSegmentLength 5.59 k allocs: 0.263 MB 5.59 k allocs: 0.263 MB 1
Timestepping/forcing 2.55 k allocs: 0.126 MB 2.04 k allocs: 0.102 MB 1.24
Timestepping/step! 2.25 M allocs: 0.066 GB 2.24 M allocs: 0.0655 GB 1.01
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 78.45304% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.98%. Comparing base (fe398de) to head (80bea42).

Files with missing lines Patch % Lines
ext/VortexPastaAMDGPUExt.jl 0.00% 14 Missing ⚠️
ext/VortexPastaCUDAExt.jl 0.00% 14 Missing ⚠️
src/BiotSavart/host_device_transfers.jl 88.15% 9 Missing ⚠️
src/BiotSavart/BiotSavart.jl 96.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #93      +/-   ##
==========================================
- Coverage   93.32%   92.98%   -0.34%     
==========================================
  Files         122      126       +4     
  Lines        8280     8407     +127     
==========================================
+ Hits         7727     7817      +90     
- Misses        553      590      +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jipolanco jipolanco merged commit 42f5142 into master Feb 3, 2026
5 checks passed
@jipolanco jipolanco deleted the host-array branch February 3, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants